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A fast FFT-based discrete Legendre transform 

Nicholas HaleX and Alex TownsendX 
[Received on ; revised on ] 

An ^(A(logA/') 2 /loglogA) algorithm for the computation of the discrete Legendre transform and its in¬ 
verse is described. The algorithm combines a recently developed fast transform for converting between 
Legendre and Chebyshev coefficients with a Taylor series expansion for Chebyshev polynomials about 
equally-spaced points in the frequency domain. Both components are based on the FFT, and as an inter¬ 
mediate step we obtain an &(N\ogN) algorithm for evaluating a degree N — 1 Chebyshev expansion at 
an A-point Legendre grid. Numerical results are given to demonstrate performance and accuracy. 

Keywords', discrete Legendre transform; Legendre polynomials; Chebyshev polynomials; fast Fourier 
transform 


1. Introduction 

Given N values co,... ,cjv-i, the discrete Legendre transform (DLT) or finite Legendre transform calcu¬ 
lates the discrete sums 

N—l 

/t=Ic/„(rf), O^k^N-l, (1.1) 

n =0 

where P„(x) denotes the degree n Legendre polynomial and the Legendre nodes, 1 > xf g > ... > 
x l f s _ ! > —1, are the roots of Pn{x). The inverse discrete Legendre transform (IDLT), which computes 
co,... ,cat-i given /o,... ,/Jv-i. can be expressed as 

N-l 

Cn = (n + \ ) Yj w k S fk p n(*k g ), 0 < n < N - 1, (1.2) 

k =0 

where w ^ g ,... w^ g _ j are the Gauss-Legendre quadrature weights. It is the goal of this paper to describe 
fast algorithms for computing the DLT and IDLT. 

Expansions in Legendre polynomials can be preferred over Chebyshev expansions because Legendre 
polynomials are orthogonal in the standard Zr-norm. However, Legendre expansions are less convenient 
in practice because fast algorithms are not as readily available or involve a precomputational cost that 
inhibits applications employing adaptive discretizations. By deriving fast algorithms for the DLT and 
IDLT we take some steps towards removing this barrier and allowing Legendre expansions and adaptive 
discretizations at Legendre nodes to become a practical tool in computational science. 

Our approach makes use of the fast Chebyshev-Legendre transform described in (Hale & Townsend, 
2014), which is based on carefully exploiting an asymptotic expansion of Legendre polynomials and the 
fast Fourier transform (FFT). Using the FFT has advantages and disadvantages. On the one hand, fast 
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Fig. 1. Some existing fast algorithms related to the DLT and IDLT. Solid lines depict transforms discussed in this paper. The 
“non-uniform discrete cosine transform” (NDCT) is described in Section 2, the DLT in Section 3, and the IDLT in Section 4. 
This table is far from complete. Furthermore, some of the papers referenced belong to more than one of the connecting lines; for 
example, (Keiner, 2009) computes the DLT by combining a transformation from Legendre coefficients to Chebyshev coefficients 
with an NDCT. We have tried to include as many of the key papers as possible, without overcomplicating the diagram. 


and industrial-strength implementations of algorithms for computing the FFT are ubiquitous. On the 
other hand, the FFT is restricted to equally-spaced samples and is therefore not immediately applicable 
to situations like computing the DLT. In this paper we overcome the equally-spaced restriction by con¬ 
sidering Legendre nodes as a perturbation of a Chebyshev grid and employing a truncated Taylor series 
expansion. A similar approach for the discrete Hankel transform is described in (Townsend, 2015a). 

Examples of existing approaches that may be used or adapted to compute the DLT include butterfly 
schemes (O’Neil et al., 2010), divide-and-conquer approaches (Keiner, 2008; Tygert, 2010), and the 
oversampled non-uniform discrete cosine transform (Potts, 2003; Fenn & Potts, 2005; Keiner et al., 
2009). The algorithm presented here has three advantages: (1) It is simple (the main ingredients are just 
the FFT and Taylor approximations), (2) There is essentially no precomputational cost; and, (3) There 
are no algorithmic parameters for the user to tweak as they are all optimally determined by analysis. In 
particular, these final two points make the algorithm presented in this paper well-suited to applications 
where N is not known in advance or when there are multiple accuracy goals. 

It is interesting to note that before the work of Glaser et al. (2007) there was no known stable method 
for computing the Legendre nodes in fewer than (?{N 2 ) operations. 1 Therefore, prior to 2007, it seems 
likely that all DLT algorithms had a precomputational cost of at least ff(N 2 ). Nowadays, we believe 
that the algorithms described above, with the exception of (O’Neil et al., 2010), can be considered to 
have a lower precomputational cost. 

The paper is structured as follows: In the next section we develop an ff{N\ogN) algorithm for 
evaluating a Chebyshev series expansion at Legendre nodes (we refer to this as a “non-uniform discrete 
cosine transform” or NDCT). In Section 3 we combine this with the fast Chebyshev-Legendre transform 
from (Hale & Townsend, 2014) to obtain an C(/V(log A') 2 / loglog N ) algorithm for computing the DLT. 
A similar algorithm for the IDLT is described in Section 4. Throughout we use the following notation: 

*The current state of the art for computing the Legendre nodes is due to Bogaert (2014), whose ff{N) algorithm can be rapidly 
compute millions of nodes to 16-digits of accuracy in under a second. For a discussion of the history of computing Gauss- 
Legendre quadrature nodes and weights, see (Townsend, 2015b). 
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A column vector with entries vo,... ,va-i is denoted by y, a row vector by v T , a diagonal matrix with 
entries vo,..., v^-\ by D v , and the N xN matrix with (j.k) entry Pk{xj) by Pn(x). 

2. A non-uniform discrete cosine transform 

Before considering the DLT (1.1) we describe an 6’{N\ogN) algorithm for evaluating a Chebyshev 
expansion at Legendre nodes. 


A-1 

A= £c„r„(x^), O^k^N-l. (2.1) 

n =0 

This will become an important step in the DLT. Here, T n (x) = cosfncos* 1 x) is the degree n Chebyshev 
polynomial of the first kind and the sums in (2.1) may be rewritten as 


N -1 

fk= £ c„cos(«e^), O^k^N- 1, (2.2) 

n =0 

where xf g = cos ( o ! e8 ). If the Legendre nodes in (2.2) are replaced by the Chebyshev points of the first 
kind, i.e., 

xf ebl = cos( 0 c k hebl ) = cos , O^k^N-l, (2.3) 

then (2.1) can be computed in @(N\ogN) operations via a diagonally-scaled discrete cosine transform 
of type III (DCT-III) (Gentleman, 1972). Since (2.2) has a similar form to a DCT, but with non-equally- 
spaced samples 6 l k eg , we refer to (2.1) as a “non-uniform discrete cosine transform” (NDCT). 

Our algorithm to compute the NDCT is summarized as follows: consider the transformed Legendre 
nodes, G^ 8 ,..., Q l ^j g _ ( , as a perturbation of an equally-spaced grid, G ( j ..... 0^ ,, i.e., 

e[ eg = Q* k +8G k , O^k^N-l, 

and then approximate each cos (n6 k 8 ) term in (2.2) by a truncated Taylor series expansion about G k . If 
|50*| is small (see Section 2.2) then only a few terms in the Taylor series expansion are required. More¬ 
over, since the 0g,..., G£ / _ l are equally-spaced, each term in the series can be computed in &{N log A') 
operations via a DCT (see Section 2.3). The Legendre nodes, both x 1 q 8 ..... x l £ 8 _ , and G^ 8 ,..., 0 ^’ g ,, can 
be computed in f/(N) operations using the algorithm described in (Bogaert, 2014). 

2.1 Computing an NDCT via a Taylor series expansion 

As in (Hale & Townsend, 2014) we find it convenient to work in the 6 = cos" 1 * variable. There are 
three reasons for this: 

(1) The quantities 9 k eg can be computed more accurately than xf g (Swarztrauber, 2003). 

(2) The function cos 1 x is ill-conditioned near x = ±1. In particular, since 4 cos - 1 x = (1 — x 2 )“ l /2 , 

1 — (jp S ) 2 = ^((V* 2 ), and 1 — (x^Lj) 2 = <^(1V* 2 ), one can expect rounding errors that grow like 

& (N) when evaluating cos* 1 (xq 8 ) and cos* 1 (xj^ij). Furthermore, when evaluating 7)v-i (cos* 1 (x 1 q 8 )) 
and T/v-i (cos* 1 (x^Lj)) this rounding error can increase to (?(N 2 )\ 
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(3) The Taylor series of T„ (cos(0 + 50)) = cos(n(0 + 50)) about 0 involves simple trigonometric 
terms and is more convenient to work with than the ultraspherical polynomials that arise in a 
Taylor series of T n (.x + Sx) about x. 

The Taylor series expansion of T„( cos(0 + 50)) about 0 € [0, n] can be expressed as 

cosM0 + 50)) = fW*>(n0)^ = £(-l)^D/ 2 j^ ( „ 0 ) M^ i (2 . 4) 

e=o e=o 


where 


<P,(0) 


cos ( 0 ), l even, 
sin( 0 ), l odd. 


If 1501 is small then a good approximation to 7j,(cos(0 + 50)) can be obtained by tmncating this series 
after just a handful of terms. The following lemma determines the accuracy we can expect from taking 
the first L terms. 


Lemma 2.1 For any L ^ 1 and n ^ 0, 


Rlm,sg ■= max 

06[O,7T] 


L— 1 (5qY 

cos(n (0 + 50)) — E cos ^\n0) 

e=o 


t\ 


< 


(»|M|) Z 

LI 


Proof. By the mean-value form of the remainder we have 


0G[O,7E] Li L\ 


as required. □ 

For 0 ^ k ^ N — 1, we write 0^ = 0' k + 50 k and substitute the first L terms of (2.4) into (2.1) to 
obtain, 

fuM, = EQ,(E(-l) l( ' + 1 ,/ 2 +('A0^f^) (2.5) 

n—0 \l=0 ) 

L-l sal /N-l \ 

= E(-l) l,,+,)/2J ^ £(»+<«) . ( 2 . 6 ) 

7=0 \n =0 / 

where the second equality follows by interchanging the order of the summations. 


Corollary 2.1 If f k is as in (2.1) and f k} L,se k as in (2.5), then for any L ^ 1, 

I fk - fk,L,S8 k I < E* l C » I R Ln,S6 k < I kill. <K * < AT - 1. 

n =0 L " 

Proof This follows immediately from applying Lemma 2.1 to each of the terms in (2.1). □ 

Hence, if we approximate 0+%..., 0^ g l by points 0 q ,..., 0^ 1 _ l such that 

• max \d! eg — 071 is sufficiently small (see Section 2.2); and, 

oaa-i 

• 0q ,..., 0^_ l is equally spaced, 

then the number of terms, L, required to achieve an accuracy of \ j\ — f kt L.Se k I ^ £ | |c| 1 1 is small. More¬ 
over, the inner sums of (2.6) can be computed via discrete cosine and sine transforms (see Section 2.3), 
which can be computed in €>(N log/V) via an FFT. This gives us the foundations of a fast algorithm. 
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2.2 Legendre nodes as a perturbation of a Chebyshev-like grid 

The Chebyshev points of the first kind, x k) wb< ,... ,x c ^ k ( are an approximation to x 1 q S ,... ,.%-i which 

satisfy the requirements in the two bullet points above. In particular, letting Q^ hebl = cos -1 (x c , hebl ), one 
can readily show that (Olver et al., 2010, (18.16.3)) 

I K eg — 0£ hebl | ^ 2(fj-+V) ’ 

However, this bound is a little weak. A better bound is given in Lemma 2.2. 

Another possibility is to consider the leading order term of the asymptotic expansion of Legendre 
polynomials of large degree (Stieltjes, 1890): 


Pn (cos 0) 


f~ 4 r(N+ 1 ) cos((A+i)0-f) 
V Tt r (N + I) (2 sin 0) 1 / 2 ’ 


N —> o°. 


The zeros of this leading term are 


(*+?)* 




IV+i 




which are also equally-spaced and provide us with an approximation to Legendre nodes. In fact, multi¬ 
plying the numerator and denominator by a factor of 2 and comparing to (2.3) we see that these points 
correspond precisely to the odd terms (i.e., when the index k is odd) in the (2 N + l)-point Chebyshev 
grid of the first kind. 

The following lemma shows how closely Q£ beb * and 0 c k l,ebl approximate dj, eg , and Figure 2 demon¬ 
strates that the derived bounds are tight. 

Lemma 2.2 Let 0Q e? ,..., 9^ 8 _ l denote the N roots of P^( cos 0). Then, 


max 

oaa-i 


gleg _ gcheb* 


1 

3k(2N + 1) 


1 

67T1V’ 


(2.7) 


and 


max 

0<£sCA-1 


gleg _ gchebi 


3n 2 +2 ^ 0.83845 

^ 6?r(2A+l) ^ N 


(2.8) 


Proof. We first consider \9 l k eg — gcheb t | an( j res t r i c t our attention to 0 ^ k ^ [A^/2J — 1. From (Szego, 
1936, Thm. 6.3.2) we know that Q^ ,eb * < 0 k ". Moreover, by (Brass et al ., 1996, Thm. 3.1 & eqn. (3.2)) 
and cot(x) < 1 /x, we have 


gleg _ gcbebt 


< 


* , ncheb. 

2(2A+1) 2 ° k 




4N + 2 


< 


2{2N+l) 2 (4k + 3)n ^ 3(2N+l)n 


O^k^ [N/2\-l. 


By symmetry of the 9 k s and 9 c k heb *, the same bound holds for k > [N / 2J — 1 when we take the modulus. 

The bound on 9 k ’ g — 0 k ' <h ' | follows from the triangle inequality. That is, for () f k f N — \, we 
have 


gleg _ gcliebi 


< gleg _ gcheb » 


d cheb\ r\cheb* 

k ~ 6 k 




l 

3tz(2N + 1) 


n(2k-N+l) 
2N(2N + 1) ' 
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Fig. 2. The maximum distance between corresponding entries of and 0 cheb ' (upper solid line) and 0 ,es and 0 rhlb ‘ (lower 
solid line) as a function of N. The dashed lines depict the bounds derived in Lemma 2.2, which appear to be tight. 


Since k ^ N — 1, we obtain 


gleg _ Qchebi 


1 

^ 3tt(2N + 1) 


n(N-l) ^ 3n 2 + 2 ^,0.83845 

4N{N+l/2) " 6k(2N +1) ^ N 


O^k^N-l. 


□ 


Corollary 2.2 Choosing 6 t * = Q^ liebl or = G c k l,eb * for 0 ^ k ^ IV — 1 in (2.6), we have 


O^k^N-l 


k,L,se, 


max \fk-f., zncheb, | < 


. (0.83845)^,, 

1^ 1 v ||g||i 

(2.9) 

(6tt)^! Il - 111 ’ 

(2.10) 


respectively. 


Proof. Follows immediately from combining Corollary 2.1 and Lemma 2.2. □ 

We note that in (2.9) and (2.10) the terms multiplying ||c|| i are independent of N, and so the number 
of terms required in (2.6) to obtain a given precision is bounded independently of N. Table 1 shows 
these terms for 1 ^ L ^ 18. Double precision is obtained for L > 9 terms when Of. = Q^ 1 ’* and L > 17 
terms when 9£ = Q cbebl . 


2.3 Computing the discrete cosine and sine transforms 

To complete our fast algorithm for the NDCT we require a fast way to compute the sums contained 
inside the parenthesis of (2.6). In particular, writing (2.6) in vector form, we have 


L— 1 


III = E 


(_1)L(lh)/2J 


t =0 


£! 


D l se DCT(cV 


L -1 

E 

t=i 

odd 


(_l)L(f+l)/2J 

J\ 


D^DST(cW), 


( 2 . 11 ) 
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L 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0.83845 l / L ! 

8.4 x ur 1 

3.5 x nr 1 

9.8 x 10~ 2 

2.1 x 10 -2 

3.5 x nr 3 

4.8 x 10 -4 

5.8 x 10" 5 

6.1 x 10~ 6 

5.6 x 10- 7 

({6n) L V)~ l 

5.3 x ltr 2 

1.4 x 10 -3 

2.5 x 10~ 5 

3.3 x 10- 7 

3.5 x io ~ 9 

3.1 x 10 ~ n 

2.4 x 10~ 13 

1.6 x ur 15 

9.2 x 10~ 18 

L 

10 

11 

12 

13 

14 

15 

16 

17 

18 

0.83845 l / L ! 

4.7 x 10~ 8 

3.6 x nr 9 

2.5 x 10- 10 

1.6 x 10 -11 

9.7 x 10- 13 

5.4 x 10- 14 

2.9 x 10- 15 

1.4 x 10 -16 

6.6 x 10 -18 


4.9 x 10~ 20 

2.3 x 10~ 22 

1.0 x 10~ 24 

4.2 x 10~ 27 

1.6 X 10- 29 

5.7 x 10~ 32 

1.9 x 10~ 34 

5.9 x nr 37 

1.7 x 10 -39 


Table 1. The terms multiplying ||c|| i in (2.9) and (2.10) for 1 ^ L ^ 18. The shaded boxes show the smallest value of L for which 
these fall below single and double precision. 


where [c^]„ = n e c n , n = 0,...,N—l. When 0* = Q^' heb i the DCT and DST take the form 


N -1 

Y cos 

n =0 


n(k+ j)n 
N 


At-1 

Y tl ' C n sin 

n =0 


n(k+ j)7T 
N 


O^k^N-l, 


which are readily related to the DCT-III and DST-III, respectively. When 0* = Qf heb * they become 


N-l 

Y nL C „COS 
n =0 


/ n((2k + 1) 4- 
^ 2N +1 



At-1 

Y n ^ c n sin 

n =0 


/ n((2k +1) + 
( 2N +1 



O^k^N-l, 


which are equivalent to the odd terms of a DCT-III and DST-III of length 2N + 1. Hence, both may be 
evaluated in &(N\ogN) operations. 

Remark 2.1 Although, for an accuracy of double precision, the Qf heb * points require around half the 
number of terms in the Taylor series compared to Qf heb i (see Table 1), we see from above that each 
term will be roughly twice as expensive to evaluate. Hence, we expect little difference between the two 
approaches when there is a 16-digit accuracy goal. 

Remark 2.2 The numerical results in this paper are computed in MATLAB. Although MATLAB 
uses FFTW for FFT computations (Frigo & Johnson, 2005), it does not allow access to FFTW’s DCT 
and DST routines (redftoi and rodftoi, respectively.). 2 Instead, we use the DCT and DST codes 
implemented in Chebfun (Driscoll el ai, 2014), which compute the DCT and DST via a complex¬ 
valued FFT of double the length. Based on our experiments, we believe this to be a factor of 2-4 slower 
than calling redftoi and rodftoi directly. 

Remark 2.3 This section described a Taylor-based NDCT, which is closely related to the Taylor-based 
NFFT described by Kunis (2008). Alternative approaches based on oversampling and low-pass filters 
are described in Dutt & Rokhlin (1993) and Potts (2003). Although for general point distributions the 
oversampled NFFT and NDCT are considered faster than the Taylor-based approaches (Ware, 1998) 
we choose to use a Taylor-based NDCT because (a) it is simple and (b) Lemma 2.2, provides a precise 
bound on the number of terms required in the expansion for a given accuracy (see Table 1). Since the 
required number of terms is small we expect the Taylor-based method to be competitive in this particular 
case. 


“The MATLAB signal processing toolbox has a dct () function, but this is computed via an FFT of double the length. 
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2.4 Direct approach 

To test the fast algorithm above we construct T y(x^) via the three-term recurrence relation: 

T 0 (x) = 1, T\ (x) = x, T n+ 1 (x) = 2xT n (x) - T„_ i (x), n ^ 1. 

Although this approach requires (?[N 2 ) operations to compute Ta r(x leg )c, the memory cost can easily 
be reduced to &{N) by computing the matrix-vector product as a recurrence. Hereinafter, we refer to 
this as the direct approach. 

We note that one could instead compute the NDCT using the closed form expression of T „(cos 0) to 
construct Ti\/(x leg ) = cos (0 /eg [O,1,... ,1V — 1]) and then evaluate the matrix-vector product / = Tat(x ,<! £)c. 
However, such a closed form is not available when we come to compute Py (x leg ) in the next section. 

2.5 Numerical results for the NDCT 

All the numerical results were performed on a 3.40GHz Intel Core i7-2600 PC with MATLAB 2015a 
in Ubuntu Linux. The vector of transformed nodes 0 leg is computed with the legpts () command in 
Chebfun (Driscoll et al ., 2014, v5.2) which uses Bogaert’s algorithm (Bogaert, 2014). As discussed in 
Remark 2.2, DCTs and DSTs are computed using the chebfun. dct () and chebfun. dst () routines. 
MATLAB codes for reproducing all of the results contained within this paper are available online (Hale 
& Townsend, 2015). 

As a first test we take randomly distributed vectors c with various rates of decay and compare 
the accuracy of both the direct and FFT-based approaches against an extended precision computation 
(Figure 3). 3 The results for 9 cheh ' and 9 cheb * in the FFT-based approach are virtually indistinguishable 
so we show only the former. We find the FFT-based approaches are more accurate than the direct 
approach, despite our algorithm involving many significant approximations. In many applications the 
Chebyshev expansion in (2.1) is derived by an approximation of a smooth function. In this setting, 
if the function is Holder continuous with a > 1/2, we observe that that the FFT-based algorithm has 
essentially no error growth with N. 

Our second test investigates the time taken to compute a real-valued NDCT of length N. Figure 4 
compares the direct approach (squares) and the FFT-based approach with Q cheb i (triangles) and 9 cheb * 
(circles), in the range 100 ^ N ^ 10,000. The variation of computational times between consecutive 
values of N is typical of FFT-based algorithms, where the theoretical cost of the FFT as well as the 
precise algorithm employed by FFTW depends on the prime factorization of N. The triangles and 
circles lie on a piecewise linear least squares fit to the computational timings, to suggest some kind of 
‘average’ time for near-by values of N. The variation also makes it difficult to say when the FFT-based 
algorithm becomes more efficient than the direct approach, but it occurs around N = 1,000. 

3. The discrete Legendre transform 

Our procedure for computing the DLT in (1.1) splits naturally into two stages. The first deals with the 
non-uniform frequency present in P „(cos 9) and converts the transform to one involving 7),(cos 9) with 
uniform frequency in 9. The second stage deals with the non-uniform grid cos (9q S ), ..., cos( 9 ^ g ,) and 
is the NDCT from the previous section. 

J In particular, the vector corresponding to, say, N = 50 with t/(n 0-5 ) decay can be reproduced exactly by the MATLAB code 

rng(0, ' twister' ); c = rand(50,1) ./sqrt(1:50) .';. 
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Fig. 3. Errors in computing the NDCT of vectors with various rates of decay using the direct method (left) and the new FFT-based 
approach with 6* — Q ^ heb i (right). In both cases, a vector of length N is created using randn (N, 1 ) in MATLAB and then scaled 
so that the nth entry is ^(rc°), ^(ft -0-5 ), and giving the four curves in each panel. The dashed lines depict 

heuristic observations of the error growth in each case. The log factors may seem somewhat arbitrary, but the lines give a much 
better fit to the data when these are included. 


Denote by M the N xN upper-triangular matrix given by 


M k „ 


1 r(n/ 2 + 1 / 2 ) 
n r(n/ 2+1) ’ 

2 r((n-k)/ 2+1/2) 
n r((n+k)/ 2+1) > 


0 = k ^ N — 1, n even, 
0<k^n^N—l, k + n even, 


0 , 


otherwise. 


If c is the vector of Legendre coefficients in (1.1) and c = Me, then we have (Alpert & Rokhlin, 1991) 


N -1 N-l 

fk = L C n P n (x[ eS ) = £ £nT n {x l k eg ), O^k^N-1. (3.1) 

n=0 n =0 

Now, the summation on the righthand side takes the form of the NDCT (2.1) and we may use the 
algorithm described in Section 2.3 to compute / in (7{N log/V) operations. 

To compute c directly, i.e. via the matrix-vector product Me, requires C(/V 2 ) operations. Instead, 
we note that the transform £ = Me can be computed in fj (N(\ogN) 2 / loglog/V) operations using the 
FFT-based algorithm described in (Hale & Townsend, 2014) or by the fast multipole-based approach 
in (Alpert & Rokhlin, 1991). In this paper we use the algorithm in (Hale & Townsend, 2014) because it 
has no precomputational cost and so allows for adaptive discretizations. 

Another convenient property of the transforms in this paper is that there is an asymptotically optimal 
selection of algorithmic parameters for any working tolerance. In the NDCT, the number of terms in 
the Taylor series expansion can be precisely determined based on the working tolerance. We would like 
this to also hold for the CfA'OogA'') 2 /loglogA) Chebyshev-Legendre transform described in (Hale & 
Townsend, 2014). We remark that though the paper considered only a working tolerance of machine 
precision throughout, the algorithmic parameters were derived and determined in terms of e > 0. We 
slightly modified our implementation to exploit arbitrary working tolerances. 
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a 

H 


10 2 10 3 10 4 
N 



Fig. 4. Time to compute a NDCT of various lengths using the direct approach (squares) and FFT-based approaches from Section 2 
with x chebl (triangles) and x rbf ' h - (circles). For larger values of N the two FFT-based approaches require a comparable time, and 
the crossover with the direct approach occurs at around N = 1,000. We believe the jump in the x lh, ’ b ‘ time (circles) at around 
N = 650 is caused by a switch in algorithmic details of the FFT routine. 


3.1 Numerical results for the discrete Legendre transform 

First, we take random vectors c with normally distributed entries and then scale them to have algebraic 
rates of decay 0(n°), ^(n^ 05 ), f/(n '),and ^(n~ L5 ), as in Section 2.5. The accuracy of both the direct 
and FFT-based approaches is compared against an extended precision computation. The results are 
shown in Figure 5 and one can see that the errors for the direct and FFT-based approach are comparable. 
In most practical applications the Legendre coefficients are associated to a smooth function so they 
decay at least algebraically. 

While we achieve a similar error to the direct approach, our FFT-based approach has a lower com¬ 
plexity and are more computationally efficient for large N. Figure 6 shows a comparison of the compu¬ 
tational times for the direct and FFT-based approach. When comparing with Figure 4 we see that the 
computational time for the DLT is dominated by the leg2cheb transformation. For an accuracy goal of 
double precision, the crossover between the direct and FFT-based approach is approximately N = 5 ,000. 

4. The inverse discrete Legendre transform 

We now turn to the IDLT, which can be seen as taking values of a function on an /V-point Legen¬ 
dre grid and returning the corresponding Legendre coefficients of the degree N — l polynomial inter- 
polant. The basis of the algorithm we propose is analogous to the forward transform and has the same 
Lt(N(\ogN) 2 / loglog/V) complexity. Unlike most NFFT algorithms which resort to an iterative ap¬ 
proach for computing the inverse transform (Anderson & Dahleh, 1996; Dutt & Rokhlin, 1993), here 
we take advantage of the orthogonality of Legendre polynomials, and in particular the exactness of 
Gauss-Legendre quadrature, to derive a direct method for the IDLT. 

It is convenient to work with matrix notation rather than summations, and we denote the N x N 
matrix with (j,k) entries P* (xj g ) as P^(x ,e ^) and the N xN with ( j,k ) entry T^x 1 ^) as Ta r(x leg ). Us¬ 
ing the orthogonality of Legendre polynomials and the exactness of Gauss-Legendre quadrature, the 
IDLT (1.2) can then be conveniently written as 


c = P ^ 1 (x leg )f = D s V T N {^)D, eg f, 


(4.1) 
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Fig. 5. Errors in computing the DLT of vectors with various rates of decay using the direct approach (left) and the FFT-based 
approach (right). In both cases, a vector of length N is created using randn (N, 1 ) in MATLAB and then scaled so that the 
/7th entry is ^(w -0-5 ), ^(n _1 ), and ^(w -1-5 ), giving the four curves in each panel. The dashed lines depict heuristic 

observations of the error growth in each case. 


where s = (||-Po| I 2 2 ’ - • • ’ ll-Gv-i II 2 2 ) T and w leg is the vector of Gauss-Legendre quadrature weights. 
From (3.1) we know that, for any vector c, Pa^x^c = Ta^x^c = Ta i(x Ie8 )Mc, and hence we have the 
matrix decomposition P^ix 168 ) = T]\/(x ,eg )M. Substituting this into (4.1) we obtain 

c = D s M T T T N (x* e8 )D„f. 

Thus, c can also be computed in i^(lV(loglV) 2 /loglog.iV) operations because: 

1. D w and D s are diagonal matrices, so can be readily be applied to a vector in f/(N) operations, 

2. T T N (x leg ) is precisely the transpose of the NDCT in Section 2. Since the transpose of the DCTs 
and DSTs in Section 2.3 can themselves be expressed in terms of DCTs and DSTs (of type-II), 
?W es ) can be approximated in fj : (N log A') operations by taking the transpose of (2.11). 

3. The matrix-vector product with M T is closely related to the Chebyshev-Legendre transform and 
can be computed in <^(lV(log.iV) 2 /loglog.iV) operations by a slight modification of the algorithm 
in (Hale & Townsend, 2014). (Alternatively, one could use the ideas in (Alpert & Rokhlin, 1991).) 


4.1 Numerical results for the inverse discrete Legendre transform 

For the inverse transform, the direct approach requires 0(N 2 ) operations to compute the IDLT of length 
N, rather than the naive ff(N 2 ) standard inversion algorithm. This is because the direct approach 
we have implemented exploits the discrete orthogonality relation that holds for Legendre polynomi¬ 
als (see (4.1)). The matrix-vector product with p W eg ) in (4.1) is computed by using the three-term 
recurrence relations satisfied by Legendre polynomials, except now along rows instead of columns. 

For our first numerical experiment we take randomly generated vectors with normally distributed 
entries and compare the accuracy of both the direct and FFT-based approach against an extended preci¬ 
sion computation (see Figure 7, left). Unlike in Section 2.5 and Section 3.1 we do not consider decay 
in these vectors, as they will typically correspond to function values in space where there is no reason 
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N 


Fig. 6. Time taken to compute the DLT of length N using the direct approach (squares) and FFT-based approach (triangles). Here 
we choose 1000 logarithmically spaced values of N in the range 10 3 —10 5 . The larger squares and rectangles are every 50th one of 
these. We omit the results for the FFT-based approach using the points x rhf ' h as they are indistinguishable from those of jf bebl . 


to expect decay. We find that the errors incurred by our FFT-based IDLT are consistent with the direct 
approach. 

In Figure 7 (right) we show the execution time for computing the IDLT with the direct approach 
(squares) and FFT-based approach (triangles). As with the DLT, our IDLT algorithm is more computa¬ 
tionally efficient than the direct approach when N ^ 5,000. 

Conclusion 

We have presented fast and simple algorithms for the discrete Legendre transform and its inverse, which 
rely on a nonuniform discrete cosine transform and the Chebyshev-Legendre transform. Both com¬ 
ponents are based on the fast Fourier transform and have no precomputational cost. For an A'-point 
transform both algorithms have a complexity of i^(A(logA) 2 /loglogA) operations and are faster than 
the direct approach when N ^ 5,000. MATLAB code to compute the transformations (and all the results 
contained in this paper) are available online (Hale & Townsend, 2015). The codes are also accessible in 
Chebfun (Driscoll et al., 2014) via chebf un . dlt () and chebf un . idlt (), respectively. As part of the 
analysis of the algorithm we derived a bound on the distance between associated points in a Chebyshev 
and Legendre grid of the same size (see Lemma 2.2), which we believe to be tighter than any similar 
bounds appearing in the literature. 
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Right: Time taken to compute the IDLT of length N using the direct approach (squares) and FFT-based approach (triangles). 
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